perm filename TUT[HPP,DBL] blob sn#195037 filedate 1976-01-05 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00006 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	.DEVICE XGP
C00004 00003	Some of the most exciting and challenging tasks that you and I face
C00017 00004	Now let's run our imaginary system forward,
C00024 00005	Suppose you wanted to write a program which did this kind of heuristic
C00038 00006	Well, in any event, that was how AM was designed and created. You now know
C00048 ENDMK
C⊗;
.DEVICE XGP

.FONT 1 "BASL30"
.FONT 2 "BASB30"
.FONT 4  "BASI30"
.FONT 5  "BDR40"
.FONT 6  "NGR25"
.FONT 7  "NGR20"
.FONT A "SUP"
.FONT B "SUB"
.TURN ON "↑α↓_π[]{"
.TURN ON "⊗" FOR "%"
.PAGE FRAME 54 HIGH 80 WIDE
.AREA TEXT LINES 4 TO 53
.COUNT PAGE PRINTING "1"
.TABBREAK
.ODDLEFTBORDER←EVENLEFTBORDER←1000
.AT "ffi" ⊂ IF THISFONT ≤ 4 THEN "≠"  ELSE "fαfαi" ⊃;
.AT "ffl" ⊂ IF THISFONT ≤ 4 THEN "α∞" ELSE "fαfαl" ⊃;
.AT "ff"  ⊂ IF THISFONT ≤ 4 THEN "≥"  ELSE "fαf" ⊃;
.AT "fi"  ⊂ IF THISFONT ≤ 4 THEN "α≡" ELSE "fαi" ⊃;
.AT "fl"  ⊂ IF THISFONT ≤ 4 THEN "∨"  ELSE "fαl" ⊃;
.MACRO B ⊂ BEGIN VERBATIM GROUP ⊃
.MACRO B1 ⊂ BEGIN PREFACE 0 INDENT 0 SINGLE SPACE ⊃
.MACRO E ⊂ APART END ⊃
.MACRO FAD ⊂ FILL ADJUST DOUBLE SPACE PREFACE 2 ⊃
.MACRO FAS ⊂ FILL ADJUST SINGLE SPACE PREFACE 1 ⊃
.FAS
.PAGE←0
.NEXT PAGE
.INDENT 0
.TURN OFF "{∞→}"   
.GROUP SKIP 1
.BEGIN CENTER  PREFACE 0

⊗5↓_AUTOMATED   MATHEMATICIAN_↓⊗*

⊗5↓_TUTORIAL  TALK_↓⊗*


for the Stanford ⊗2Heuristic Programming Project⊗* Workshop

⊗4January 5-8, 1976⊗*


Douglas B. Lenat



.END
.SELECT 1
.EVERY HEADING(⊗7Douglas B. Lenat⊗*,⊗4↓_Automated Mathematician_↓⊗*,⊗7TUTORIAL      page   {PAGE}⊗*)

Some of the most exciting and challenging tasks that you and I face
as scientists are thinking up new research problems, picking one to
work on, and deciding when to quit and try a new one.
This activity has a very different flavor than simply 
solving carefully-specified problems.
Consider the Missionaries and Cannibals problem.
It takes a few basic  abilities to be able to solve it.
But what did it take to ⊗4invent⊗* it? How could someone sit down and
think up a puzzle like that? The abilities needed are quite different
from those required just to solve it.

How could we even start to analyze this ill-structured problem:
how to propose new, interesting questions to investigate.
Let's take another
example, say that of prime numbers. 
If I tell you how they're defined, you can probably go off and find me some.
But how would you formulate some interesting conjectures involving primes?
To be even more demanding, let's ask
how someone might have been led to ⊗4propose⊗* that definition in the
first place? How could you make a discovery like that, why would you think that
such numbers were worth looking into?
Think about that for a few seconds.
I'll let you assume you
already know about factoring a number, finding all its divisors.

(pause 5 seconds)

What would be some natural questions to ask, say right after you'd
come across the concept of finding the divisors of a number?

(pause 5 seconds)

A natural sort of question to ask is "Which integers have an unusual
number of divisors?"  In particular, you'd get around to asking which
numbers have an abnormally small number of divisors.
The answer, of course, is: those numbers which are prime.
This heuristic would be your justification for giving the primes a special
name, and for spending some time investigating them.

So we have reduced the problem of
"how in the world could you discover prime numbers"
to the simpler one of 
"how in the world could you discover 
the concept of divisors-of". The reduction was accomplished by citing
<heur1 slide>
the general heuristic rule that "if f is an interesting function,
it's worth investigating those elements
that f transforms into ⊗4extremal⊗* elements."
Using more mathematical jargon, one could say 
"the inverse image of an interesting set under an interesting function
is probably interesting itself".

Now we're beginning to see
what it might mean  to "explain" a discovery. We  just keep reducing it to
simpler and simpler discoveries, using some general heuristic rules like these,
until our problem is reduced to "discovering" a bunch of concepts which we
already know.

We'll continue with this example a little further.
How could you discover the concept of "divisors-of"; i.e., factoring?
Suppose you believe in the
heuristic "if f is an an interesting function, consider its inverse". 

(pause 5 seconds)

This ⊗4reduces⊗* the problem to that of discovering ⊗4multiplication⊗*,
because finding the factors of a number is the inverse of multiplying
two numbers.

Multiplication, in turn, could be discovered from more primitive concepts, like 
the cross-product of two sets, and cardinality ("counting").
The heuristic in this case might be a desire to "complete the square"
for a diagram like this one:

.B
	PAIRS OF SETS →→→→→→→→→→ Counting →→→→→→→→→→→ PAIRS OF NUMBERS
		↓											↓
		↓Cross-product				       					(?)
		↓											↓
	      SETS →→→→→→→→→→→→→ Counting →→→→→→→→→→→→→→→→→ NUMBERS

.E

This says that Counting takes a set into a number (namely,
the length of that set).
Cross-product transforms
a pair of sets into a single set (namely, their cross-product). And counting
also takes a pair of sets into a pair of numbers.
It is natural to ask what this unknown function here is, which takes
a pair of numbers into a number, and corresponds to cross-product.

The unknown function turns out to be multiplication.
Of course, the algorithm that this would give you for multiplication
would be atrocious: given two numbers, construct 2 sets with those lengths,
compute the cross-product of those two sets, and count the number of
elements in that cross-product.

The concept of NUMBER can be discovered by trying to find a canonical 
representation for all sets with the same length. Having the same length
can be noticed as a generalization of having the same elements as.
By now, we're talking about concepts which are so elementary that it
is more painful to try to reduce their discovery than to quit and assume that
they are already known.

By reversing our chain of reductions, we have what I call a plausible
scenario for discovering prime numbers. It looks something like this:
<chain slide>

.B1 TURN ON "/" FOR "\" TABS 40,61

   ⊗7Canonical form⊗*  ...SETS\ ⊗7complete-the-square⊗*/⊗7look at the inverse⊗*/⊗7extreme number of⊗*

 SET------→NUMBER-----------→MULTIPLICATION-----------→DIVISORS---------→PRIMES

 ...CROSS-PRODUCT/

.E

Each node represents a discovery, and each arc is labelled with the
heuristic which made that discovery plausible, given the previous concepts.

Now suppose we analyze a great many mathematical discoveries, and collect 
several hundred such heuristic rules. We can use them not only to explain
a discovery, but to generate new ones. We start with a basic core of concepts,
and repeatedly pick a heuristic rule and try to apply it.  Notice that
this synthesis process is a lot like growing a tree, where the nodes are the
concepts, and the operators are these rules. 


Proceeding in this way, from primitive concepts like Sets, Subsets, and
Union, we could eventually discover the prime numbers, and many other interesting
concepts. If we actually got our heuristics by analyzing the discoveries
of certain concepts, then 
certainly the heuristics collected should be sufficient to guide you to re-discover
those concepts. But intuitively, we hope that these heuristics would
be truly general,
would lead to discovering many different worthwhile concepts and relationships.
This is not a trivial issue, and we'll discuss it Thursday.
What ⊗4is⊗* the justification behind this whole process? behind the choice of
initial core concepts?

If we really are growing the tree in this direction, notice
how broad it probably is: 
at any moment, several rules might be applicable, and each with several
possible choices of what to apply it to.
For example, our "look at the inverse" suggestion applies to all relations.
As another example, our "see what maps into extremal elements" heuristic
can tell us to seek not just those integers with a minimum number of divisors,
the primes, but also those integers which are ⊗4maximally⊗* divisible.
<draw on slide: downward arrow, to MAXIMALLY-DIVIS. Label the arc "extreme: max">


What's even more crucial, we don't know back here  <at divisors> that we're going to
discover something wonderful by taking this path rather than that one,
so how do we decide? All we can do is have some local evaluation criteria,
based primarily on past experiences, which guide our path at each node.
In research there is no "right" problem to investigate, no "correct"
theorem to propose, no specific goal. 
This is a key difference between problem-solving and problem-proposing:
the presence or absence of a goal. Such a concept-grower should
never stop; it makes no sense to talk about it "succeeding" or "failing."
You can rate the quality of its work, but can't demand that it discover
particular concepts.

The most one can dare to do is to compare its performance against 
known mathematics, or to somehow measure the rates at which interesting
new concepts -- and losers -- get created and worked on.
The automatic-discoverer is doing well so long as it continues to generate new,
interesting concepts.

Now let's run our imaginary system forward,
starting with the notions of prime numbers and maximally-divisible
numbers, as well as more elementary concepts.
Say the system decides to look at examples of the function Factorings-of.
So it constructs  entries, like <Factorings slide>
.B1

Factorings-of(18) = { (18,1),(9,2),(6,3),(3,3,2)}

and so on.

.E

It tries to notice some interesting pattern. One heuristic it has says that
"An operation is very interesting if each image is mildly interesting
in exactly the same way".
So how are each of these sets interesting in the same way?
Well, how are they interesting at all? We know that
"A set of lists is mildly interesting if, for some element L of the  set,
every element of L is a very unusual kind of object."
Another heuristic tells us that the most efficient way to look for these
properties is starting with the smallest set, namely this one.
Well, every member of this list is ⊗4odd⊗* and is a ⊗4prime⊗*; is it true that
each of these sets contains a list of all odd numbers? 
No. Well, there goes that conjecture. Well, does each of these sets
contain a list of all primes? 
Yes, each one does.
So we can conjecture: Factorings-of is very interesting because (it seems)
some way of factoring each integer is into all primes.
This is awkward to state, so let's define a new operation:
Prime-fac(n) is the set of all ways of factoring n into  primes.
So Prime-fac(18)={(3,3,2)}. <on slide, circle Prime-factoring for each n>
Then we can phrase our interesting 
conjecture as follows:
Prime-fac is defined for
all integers. A general heuristic we have is "if an operation has just been
defined, see if it is single-valued". It seems that, yes,
the decomposition of each integer into primes is unique. By the way, this is the
fundamental theorem of arithmetic. By making good new definitions, even
very subtle relationships like this one are reduced to "obvious" questions to ask.
We've just shown that the unique factorization conjecture can be encountered
as the simple question "is that relation, Prime-fac, a function?"
Not only do the heuristics lead in a natural way to new concepts, they also
lead smoothly to interesting conjectures.

It's not too important that you follow every detail in these developments;
only the spirit of what I'm doing counts. I'm using the heuristics to guide
me in developing new concepts, and deciding what relationships to look for.

Let's look now at our maximally-divisible numbers, sort of the converse of
primes. Virtually no attention has been paid them in history, but it
turns out that the form of these numbers is necessarily 
p⊗B1⊗*⊗Aa1⊗*  p⊗B2⊗*⊗Aa2⊗* p⊗B3⊗*⊗Aa3⊗*...   p⊗Bk⊗*⊗Aak⊗*,  where the
p⊗Bi⊗*'s are the first k consecutive primes, 
so we may not skip any prime,
and the exponents a⊗Bi⊗*
decrease   with  i,  and   the  ratio  of   (a⊗Bi⊗*+1)/(a⊗Bj⊗*+1)  is
approximately   (as   closely   as    is   possibe   for    integers)
log(p⊗Bj⊗*)/log(p⊗Bi⊗*). For  example, a typical  divisor-rich number
is
n=2⊗A8⊗*3⊗A5⊗*5⊗A3⊗*7⊗A2⊗*11⊗A2⊗*13⊗A1⊗*17⊗A1⊗*19⊗A1⊗*23⊗A1⊗*29⊗A1⊗*31⊗A1⊗*37⊗A1⊗*41⊗A1⊗*43⊗A1⊗*47⊗A1⊗*53⊗A1⊗*.
The progression of its exponents+1 (9 6 4 3 3 2 2 2 2 2 2 2 2 2 2 2)
is  about as  close  as one  can  get  to satisfying the  "logarithm"
constraint.   
By that I mean that 9/6 is close to log(3)/log(2); that 2/2 is close to
log(47)/log(43), etc.  Changing any exponent by plus or minus 1 would make
those ratios worse than they are.
This  number n <point to n value> has about 4 million divisors.
The "AM  Conjecture" is that no  number smaller than n  has
that many divisors.
This is so far the only piece of interesting mathematics, previously unknown, 
that was directly motivated by AM. AM is the
experimental system I've been working on which actually carries
out simple syntheses of new concepts.
Maybe the time has come to begin talking about that program.

Suppose you wanted to write a program which did this kind of heuristic
concept-growing. What exactly would you have to do?

Well, you have to decide on a body of primitive concepts, that is, a bunch
of notions which are considered already-known by the system. This will be
the starting state of knowledge.
Some of these concepts will be static, like "Sets" and "Conjectures", others will be
activities, like "Reverse-an-ordered-pair" or "Compose-2-relations".
Notice that simply excuting some of the activities will produce new concepts.
For example, Composing the two relations Intersection and Complement,
we get the relation Intersection(x, Complement(y)) which is just Set-difference(x,y).
If this operation weren't in our core of concepts, it could then be added.
Here are some of the concepts that AM actually started with:
<inital core slide>.

In addition to specifying the initial concepts, we must collect the heuristics
that the system will be guided by. One way to do this is to imagine the system
making various discoveries, and deciding what rules it will need to know.
Another, somewhat fairer way is to get experts to introspect on all the
clues they use, all the knowledge that guides their research efforts.
Unfortunately, many of these heuristics are demon-like, not even occurring to
him except in situations where they might be used. So it would be best to
prepare some exhaustive questionaire for the experts, to systematically
put them in all conceivable situations, while they are introspecting.
We'll return to how to do this in a minute.

So now we have a list of concepts and a list of heuristics. How do we
actually implement this as a computer program?
What is left to do? We must make precise our representations and algorithms.
How is a concept to be represented? Precisely what is to be "given" initially
about each of these concepts? How are the heuristics stored? How are they
used? What is the control structure like?

First, try to think of what constraints there are on our answers to these
questions. What ⊗4do⊗* we have to tell the system about each concept?
Well, just its name isn't enough. Its definition, perhaps. Maybe some
abstract way of viewing this concept, a sort of "intuition" about it.
Maybe some examples of it; well no, the system should be able to fill them
in. In the case of an operation, maybe giving its domain and range would
be a good idea. 
Also, to help the system decide what to do, it would be nice to supply
some kind of rough judgment of the worth of this concept, its interest,
its usefulness, its simplicity, etc.

Let's leave that for the time being, and turn to the heuristics. What
requirements can we put on them? Well, it would be nice to have each one
come to the system's attention only when it plausibly could be used.
Sometimes, there will be special information that just pertains
to one kind of concept. For example, some of the heuristics tell whether a given
composition is interesting or not. There's no reason to access them unless
we're trying to judge a composition. In fact, by looking at the heuristics,
it appears that
only a few very weak ones are truly "global". ALmost every single one can be
identified with a particular concept (and all specializations of that
concept). It's almot as though the relevant heuristics were just another
facet of each concept, like domain/range, examples, and definition.
In fact, we can even break the heuristics down into finer categories,
for each concept C, like heuristics that deal with checking a C,
heuristics that deal with creating a new C, those that deal with judging
how interesting a particular instance of C is, and so on.

Hmmm. That would be a nice way to think of things, since it provides us with the
framework we talked about to pick the experts' brains,
a kind of "questionaire" structuring.
We just get them
to think of heuristics for each particular facet of each concept in turn.
There's no reason to put the resulting little bundles of heuristics together;
just leave them attached to the concepts' facets. Whenever some concept C is 
the center of attention, the only heuristics that the system need worry about 
are  those attached to C, or some generalization of C, and probably only those
attached to the relevant facets that the system is working on.

But what if we're thinking about performing an operation, like Composing
two relations. Then the heuristics from all 3 concepts have to work
together. How can we arrange that? Well, a rational AI answer is: give them
all the same representation, let them be simple cooperating knowledge
sources. For example, they could all be predicate calculus statements,
and combining them would mean conjoining them. 
But that would involve logical manipulations to conclude anything.
Or they could all be productions,
and combining them would mean simply dumping them together into one large
production system. I like that image better. We still have to decide what
the left and right hand sides of each production rule will look like.
But let that go for now.

Oh, another problem! Some of the heuristics apply not to concepts but to facets;
like heuristics for how and when  to fill in examples of anything. 
They apply to the whole notion of examples.
Well, maybe it won't hurt to consider giving the system a full-fledged concept
for each facet. So there would be some concept named Examples, some concept named
Definition, and so on, in addition to the kinds of static and ooperational concepts
we already decided to include. 
Then it makes sense to talk about some heuristics for dealing with Examples;
we simply tack them onto the concpet named Examples.
Well, that's not so bad. 
Oh, but then
the set of facets a concept can have must be fixed once and for
all.  Otherwise we'd have to keep adding new concepts every time a new kind
of facet was wanted. So the set of possible facets is decided and fixed.
Well, that may be actually be better than letting each concept have whatever facets
it wants. Since we know what the facets are going to be,
we can  collect knowledge about each of those facets.
So let's add these facets, as individual concepts, to the initial core
which the system starts
with. Some of this might be tricky, like asking for the definition
of a definition. Maybe the answer is that if it's that tricky we have no
business asking about it. If you know enough to ask for it, you already
understand it.

Well, I've stopped following what I'm saying, so let me go back to how to
implement the concepts.
We may as well make their facets explicit, say
as properties on a property list. Hmmm. Some of them are pretty trivial;
domain/range should be just a properly-formatted, static bit of data.
But some are very sophisticated; the algorithms part for instance.
They will have to be procedural. Well that's all right.

What does the system actually do, now? It expands its knowledge, both by
creating new concepts and by finding out more about the already-known ones.
That means it creates new data structures like these, and it fills out
the properties of existing ones. In general, the creation of a new one is
directly suggested while filling in some detail somewhere.
So the only real top-level activity need be: pick some facet of some
concept, and try to find some new knowledge to put in that slot.
During this process, some new concepts may be created.

What does it mean for a new one to be created?
Well, whoever created it should probably know its definition, and
something about its worth. Probably also how it connects to some
existing concepts. Once it's creatd, most of its facets will be empty,
so there will be many new things for the system to do. The space of
possible actions will grow very rapidly.

For those facets which are procedures, filling them in will mean writing
new little programs. Well, we know enough about automatic programming to
do that. The knowledge stored under the Algorithms concept, for example,
will be automatic programming techniques, which can use, say, the definition
of any concept to produce an executable algorithm for it.

Sometimes we'll find out that two concepts are really the same; in that case, I
suppose we can just merge the corresponding facets of each concept.
No problem there.

Well, in any event, that was how AM was designed and created. You now know
all the essentials of its representations and algorithms. 

The only piece missing from the picture is a concrete look at AM itself.
Tomorrow, if you want,
you can work with AM during the morning demo period.
I have some detailed examples prepared, 
and will discuss them tomorrow morning, around 10.
If youire interested, they'll help you get a feeling for
how the current version of AM actually does its stuff. 

Step back for a moment and look over what's been done.
Notice that this whole discussion is not really specific to doing math research.
The design for the concept-grower can be used to implement
almost any kind of theory formation system.
The only change from one program to another would be the particular
starting concepts, the particular facets each concept could have, and
of course the individual heuristics would change. So the design of the
system could be the same. Perhaps if several such systems were created,
they could be run together, and cooperate.

But how should we choose the particular scientific field for the theory formation?
Why was mathematics chosen?
Well, for one thing, mathematicians 
don't have to worry about uncertainties in their data, as
one would from, eg., a mass spectrometer. A personal reason was that
I know something about mathematics,
so I wouldn't have to elicit the heuristics from experts outside the AI field; I
could use introspection. So that problem is eliminated. But the biggest
distinction is the idea that each ⊗4natural⊗* science tries to explain observed
data; mathematics is the only science not limited to explaining nature.
Ed and Josh came up with a similar list of distinctions about a decade ago,
but they viewed them as reasons for doing theory formation in chemistiry 
and ⊗4not⊗* graph theory.
"Choosing a good domain" is one of the topics we'll discuss on Thursday.

To prepare for that, let me stress today that 
mathematics is an empirical science.
This is something which is hard to believe unless you've ever done some
math research. You quickly find that it has very little to do with the
smooth flowing proofs in textbooks. It's more like physics, where you
gather some data, look for regularities, and induce some conjectures.
But often they can't be phrased concisely with the existing
terminology. This is a clue that some new definitions should be made.
Or perhaps some concept is too rigid, too narrow to permit some conjecture
to hold. Then we can modify it, generalize or relax it in some way.
So if anything, math research proceeds in almost the opposite order from
the way it is finally reported in books and journals.

Now let's see what lessons we can learn from AM, that apply to
programs you might be working on.
If you try to imagine the number of possible avenues to explore, for any
given state of mathematical knowledge, the mind boggles.
It is sheer folly to
even think of walking around in it unguided.  Heuristic rules
-- like the kind that AM contains -- can
transform this so that the only legal moves are those which are
locally plausible, those which can be justified by some general heuristic.
This is like constraining the generator in any heuristic search
program.
But even this isn't enough; some meta-heuristics are needed to decide
which of all the legal heuristics to apply in a given situation, and how.
Experimental results from AM indicate that if the heuristics are good,
the meta-heuristics can be very simple, like numerical weights, 
and no meta-meta-heuristics at all are needed.
It would be interesting to see if that is true in some sense for other
tasks as well.

One effective technique for learning is to be exposed to others' mistakes.
So let me mention some of the problems I had programming AM.
One  problem we all face is how to cope with changes in the system's knowledge
base. The standard solution is to track down all the effects of each change, and
update the whole data base. The solution AM uses, and perhaps people too, is to
just record the changed information, and ⊗4not⊗* to update everything.
When a contradiction of some kind occurs, then the conflicting knowledge
is checked, the knowledge ⊗4it⊗* was based on is checked, 
and so on, until we encounter
some changed opinion which explains the disagreement.
So the system is allowed to hold contradictory
viewpoints, so long as it could resolve any dispute if it had to.
This is one solution you might keep in mind for your knowledge-based systems.

Of course ⊗4the⊗* standard problem in our business is the tremendous gap
between English prose that sounds plausible, that will convince a human,
and LISP code that
will run on a PDP10. I can't just say that a Composition is interesting if
its domain and range sets coincide; I have to specify precisely how
interest is calculated and used, and in particular how to compute the
interest value for such a composition.
The precision I need to instruct the machine still catches me unaware sometimes,
and I find myself underestimating the time I'll need to program some idea
I was sure would be a cinch.
This problem can be summed up as "Just because you can talk about something doesn't 
mean you understand it".  
There is no solution to this. It is one reason that we actually write these
programs.  
It implies that we can talk about more than
we really understand.
Which suggests to me that it's about time for me to stop talking.

There are several other questions which should be discussed sometime, like
how to measure AM's performance, what the justification is for its
particular initial core of knowledge, what the role of the user is,
what it would take to be able to do research in various areas of mathematics,
what characteristics make a task ripe for attack by AI methods,
what considerations come into play when choosing a name for the system,
and many more.
On Thursday, 
I'll talk about some more of these problems, and those we miss can be
covered at a SIGLUNCH sometime. I'll also
go into more detail about the state that AM is
actually in, and we can discuss plans for its future.
Let me leave the rest of ⊗4this⊗* hour open for questions or discussion.